1 Credits

This tutorial uses materials that was created by the CANSSI Collaborative Research Team led by Vianey Leos Barajas and Marie Auger-Méthé. I thank Fanny Dupont, Ron Togunov, Natasha Klappstein, Arturo Esquivel, Marco Gallegos Herrada, and Sofia Ruiz Suarez for their contribution to this original material.

2 Tutorial goals

The goal of this tutorial is to explore ways to pre-prepare marine movement data. The primary learning objectives are to:

  1. Regularize movement tracks that are irregular using Fastloc GPS data as an example.
  2. Create predicted tracks from error-prone data using Argos data as an example.

3 General setup

First, let’s load the packages that we will need to complete the analyses. Of course you need to have them installed first.

library(momentuHMM) # Package for fitting HMMs, we use it for crawlWrap
library(tidyverse)  # data management
library(ggspatial)  # plot the data
library(sf)         # spatial data processing
library(kableExtra) # produce visually appealing tables
library(geosphere)  # to calculate step lengths from lat/lon locations - just need it installed
library(conicfit)   # needs to be installed for crawlWrap
library(here)       # To help with sourcing
library(adehabitatLT) # setNA
library(terra)
library(tidyterra)
library(aniMotum)
library(mitools)  # used by mometuHMM, just need to be installed

# install.packages("aniMotum",
#                  repos = c("https://cloud.r-project.org",
#                  "https://ianjonsen.r-universe.dev"),
#                  dependencies = TRUE)

We are assuming that you are working from Madeira-Workshop repository main folder, and using here() to help find the good directory for each files. We will need the functions in the following file utility_functions.R.

source(here("D1-data-prep-ssm",
            "utility_functions.R"))

4 Regularising Fastloc GPS data

4.1 Narwhal movement data

We will analyze a dataset containing movement tracks of three narwhals tagged with Fastloc-GPS tags. The dataset was provided by Dr. Marianne Marcoux (Fisheries and Oceans, Canada). Dr. Marcoux provided the data only for this tutorial, please do not use for other purposes without their consent. Contact: .

For simplicity, we only examine the fastloc-GPS data from one week in August 2017.

Photo by Paul Nicklen
Photo by Paul Nicklen

First, let’s import the raw Fastloc GPS narwhal data and convert the time column to an appropriate date format.

tracks_gps_O <- read.csv(here("D1-data-prep-ssm", "data", "tracks_fastloc_gps.csv")) %>%
  mutate(time = ymd_hms(time),
         ID = factor(ID))

We can have a first look at the data, to get more familiar with it.

head(tracks_gps_O)
##        ID                time        x       y loc_class
## 1 T172062 2017-08-08 00:00:00 -80.6414 72.2694       GPS
## 2 T172062 2017-08-08 00:20:00 -80.6525 72.2728       GPS
## 3 T172062 2017-08-08 00:42:00 -80.6665 72.2719       GPS
## 4 T172062 2017-08-08 00:52:00 -80.6692 72.2674       GPS
## 5 T172062 2017-08-08 01:09:00 -80.6682 72.2582       GPS
## 6 T172062 2017-08-08 01:19:00 -80.6613 72.2574       GPS

The columns/variables are: - ID: Individual identifier - time: time of location - x: longitude - y: latiture - loc_class: all GPS

The data we obtain is often quite messy with some rows/records missing information and other records duplicated. We can filter records to keep only complete location data using !is.na(x) & !is.na(y). To remove duplicate records (same time, location, and location class), we will use the lag function from dplyr, which will use the value from the previous row so that we can compare to the current row.

tracks_gps_O <- tracks_gps_O %>% 
  # remove missing locations
  filter(!is.na(x) & !is.na(y),
         # remove identical records
         !(time == lag(time) & 
             x == lag(x) & 
             y == lag(y) & 
             loc_class == lag(loc_class)))

Let’s plot the data over the bathymetry (i.e., depth of the ocean) and land layers, to get a sense of where these narwhals are swimming.

land <- st_read(here("D1-data-prep-ssm", "data", "NorthBaffin.shp"))

The raw bathymetry data used to create this raster was taken from the the GEBCO global ocean and land terrain model from https://pressbooks.bccampus.ca/ewemodel/chapter/getting-bathymetry/. Note that land values of the bathymetry layer are 0s.

bathy <- rast(here("D1-data-prep-ssm", "data", "bathy_4_narwhals.tiff"))

Let’s plot the narwhal movement data over the bathymetry and land layers.

ggplot() +
  geom_spatraster(data = bathy) +
  geom_sf(data = land, fill = "beige") +
  geom_spatial_path(data = tracks_gps_O, 
                    aes(x = x, y = y, colour = ID), crs = 4326) +
  coord_sf(expand = FALSE)

As we will see during the week, many analyses require that the locations are at regular time intervals, however Fastloc GPS data is often taken at irregular time intervals, since it depends on when the animal surface. When the data is irregular, there are two key decisions we must make, (1) the temporal resolution to use, and (2) how to address large data gaps.

4.1.1 Selecting a time interval (resolution)

The desired resolution depends on the biological question you are asking, as different behaviours and biological processes occur at different spatial and temporal scales (e.g., seasonal migration, daily movement between foraging and resting grounds, and fine scale foraging decisions), and the resolution of the raw data you have. Generally, higher resolution data is preferred as it has more information. However, it is possible to have too-high of a resolution wherein information from fine-scale variability drowns out the signal from coarse-scale patterns of interest (e.g., seasonal migration). A very coarse rule of thumb, is that you want 3-50 data points per behaviour. For behaviours spanning several hours, that roughly corresponds to a desired resolution between 2 min and 60 min.

First, let’s calculate the time difference between successive records using difftime and lead (compares current row to following row) and place these values in a new column called dt. Note that the time difference is in minutes (units = "mins"). For the last record of each individual (i.e., when ID != lead(ID)), we will set the value to NA.

# Calculate time difference between locations
tracks_gps_O <- tracks_gps_O %>%
  mutate(dt = ifelse(ID == lead(ID), # If next data row is same individual
                     # calculate time difference
                     difftime(lead(time), time, units = "mins"), 
                     NA))

Let’s see what resolutions may be possible in the data by looking at the most frequent time gaps.

# Visualise time differences (all and zoomed)
par(mfrow = c(1, 2))
hist(tracks_gps_O$dt, 1000, main = NA, xlab = "Time difference (min)")
hist(tracks_gps_O$dt, 1000, main = NA, xlab = "Time difference (min)",
     xlim = c(0,100))

# identify the most frequent dt
tracks_gps_O %>% 
  {table(.$dt)} %>% 
  sort(decreasing = TRUE) %>% 
  head()
## 
##  10  11  12  22   9  13 
## 400 145  90  64  63  63

We see that the most frequent time gap is \(10\) min, followed by \(11\), \(12\), \(22\), \(9\) and \(13\) min. We also see the majority of the gaps are \(< 60\) min, however some are in excess of \(600\) min. Finer resolutions will contain more data gaps. For some analyses, frequent and large data gaps can be difficult to handle, especially as the number of missing data points approaches or exceeds the existing data. Let’s examine the potential data structure at different resolutions for the different animals.

We can now use the function p_na (in the script utility_functions.R) to look at the proportion of NAs we would get with 10, 20, 30, and 60 min resolution.

# summarise track dt
tracks_gps_O %>% 
  group_by(ID) %>% 
  summarise(p_NA_10m = p_na(time, 10),     # 10 min 
            p_NA_20m = p_na(time, 20),     # 20 min 
            p_NA_30m = p_na(time, 30),     # 30 min 
            p_NA_60m = p_na(time, 60)) %>% # 60 min
  # return formatted table
  kable(digits = 3, col.names = c("Narwhal ID", paste(c(10,20,30,60), "m"))) %>%
  kable_styling() %>% 
  add_header_above(c("", "Resolution" = 4))
Resolution
Narwhal ID 10 m 20 m 30 m 60 m
T172062 0.486 0.236 0.191 0.152
T172064 0.502 0.203 0.120 0.074
T172066 0.488 0.230 0.185 0.160

Here we see that the \(10\) min interval, around \(50\%\) of the locations would be missing.

This is a lot! In many cases, we may want to be more conservative and use a \(30\) min resolution or \(60\) min resolution.

4.1.2 Handling data gaps

There are several ways to deal with data gaps:

  1. Split tracks
  2. Interpolate locations
  3. Fill the gaps with NAs
  4. Multiple imputation

The method to use will depend on the main analysis you are interested in.

4.1.3 Splitting tracks

One way to account for missing data is to split the track where there are large gaps (i.e., assign each track segment a new individual ID). This strategy is particularly appropriate when you have long enough gaps for which interpolation method are unlikely to perform well. We can split the tracks when the gaps larger than a predetermined threshold.

Here, we will use a function (found in utility_functions.R) to split the tracks. We define the maximum allowable gap (at which point it will start a new segment), as well as the shortest allowable track segment.

These are somewhat arbitrary decisions, and depend on your subsequent choices for regularisation. In this tutorial, we will be interpolating missing locations (within each segment) and so we only want to allow gaps that can reasonably be predicted.

We are using a 30 min resolution, so we allow a 90 minute gap (i.e., we assume we can predict 2 missing locations), and we want each segment to be at least 120 min (i.e., have at least 5 locations) long so that we have enough information about state transitions.

# Use function from utility_function.R to split data at gaps
track_split <- split_at_gap(data = tracks_gps_O, 
                           max_gap = 90, 
                           shortest_track = 120)

The new data has an updated ID column, where with the original ID and track segment number. The original ID is now in ID_old

head(track_split)
##          ID                time        x       y loc_class dt  ID_old
## 1 T172062-1 2017-08-08 00:20:00 -80.6525 72.2728       GPS 22 T172062
## 2 T172062-1 2017-08-08 00:42:00 -80.6665 72.2719       GPS 10 T172062
## 3 T172062-1 2017-08-08 00:52:00 -80.6692 72.2674       GPS 17 T172062
## 4 T172062-1 2017-08-08 01:09:00 -80.6682 72.2582       GPS 10 T172062
## 5 T172062-1 2017-08-08 01:19:00 -80.6613 72.2574       GPS 10 T172062
## 6 T172062-1 2017-08-08 01:29:00 -80.6592 72.2563       GPS 15 T172062

This data is still irregular, but now we have smaller segments split when there are large data gaps. Let’s visualize the different segment.

ggplot() +
  geom_spatraster(data = bathy) +
  geom_sf(data = land, fill = "beige") +
  geom_spatial_path(data = track_split, 
                    aes(x = x, y = y, colour = ID), crs = 4326) +
  coord_sf(expand = FALSE)

Splitting the tracks is often be a first step, before interpolating or other adjustments.

4.1.4 Interpolation (correlated random walk)

Once the track is split, there is often still irregularity within each segments, and we want to interpolate or predict new locations to form a regular grid of observations.

The simplest approach is to use linear interpolation between missing times, but a better option is to predict locations from a continuous-time correlated random walk (CTCRW). momentuHMM contains wrapper functions to interpolate missing locations by fitting a CTCRW to the data based on the crawl package by Devin Johnson and Josh London. There are many options to fit the CTCRW model, and a detailed tutorial for analysis with crawl is available here: https://jmlondon.github.io/crawl-workshop/crawl-practical.html. Let’s try to fit the most basic model using the wrapper function crawlWrap. In the most basic form, we only need to provide tracking data with the columns ID, time, x, and y and specify the desired temporal resolution.

First, let us transform the data into an sf object. crawlWrap can also take a data.frame as an argument but that would imply renaming some of our columns. It is easier to just transform the data into an sf object.

Here it’s important to project the data in a good coordinate system, since the model takes into account speed/step length.

track_split_sf <- track_split %>%
  st_as_sf(coords = c("x", "y")) %>% # converts to an sf object
  st_set_crs(4326) %>% # define CRS
  st_transform(2962) # reproject data to a UTM

Now we can fit the CTCRW to each track segment and create a data frame of predicted locations. We decided above that a good, maybe conservative time interval was 30 min. We will use that interval here. Since interpolating when they are large data gaps can bias our analysis, we use the split tracks.

The default model fitted is the simplest model. For example, it does not include the drift component described in Johnson et al. (2008) and we have not included covariates in the movement or error equations.

# crawl can fail to fit periodically, so I recommend always setting a seed 
set.seed(12)

# fit crawl
crwOut <- crawlWrap(obsData = track_split_sf, timeStep = "30 mins", theta = c(7, 0))
## Fitting 20 track(s) using crawl::crwMLE...
## Individual T172062-1...
## Individual T172062-2...
## Individual T172062-3...
## Individual T172062-4...
## Individual T172062-5...
## Individual T172062-6...
## Individual T172062-7...
## Individual T172064-1...
## Individual T172064-2...
## Individual T172064-3...
## Individual T172064-4...
## Individual T172064-5...
## Individual T172064-6...
## Individual T172066-1...
## Individual T172066-2...
## Individual T172066-3...
## Individual T172066-4...
## Individual T172066-5...
## Individual T172066-6...
## Individual T172066-7...
## DONE
## 
## Predicting locations (and uncertainty) at 30 mins time steps for 20 track(s) using crawl::crwPredict...
## DONE

Let’s look at the output of the model for the first track.

crwOut$crwFits$`T172062-1`
## 
## 
## Continuous-Time Correlated Random Walk fit
## 
## Models:
## --------
## Movement   ~ 1
## Error   
## 
## 
##                      Parameter Est. St. Err. 95% Lower 95% Upper
## ln sigma (Intercept)          8.098    0.224     7.659     8.536
## ln beta (Intercept)           0.106    0.250    -0.385     0.597
## 
## 
## Log Likelihood = -883.048 
## AIC = 1770.095
exp(crwOut$crwFits$`T172062-1`$estPar)
## [1] 3286.450263    1.111567

We get the log likelihood value, the AIC value, and the estimated parameters. Here, the autocorelation parameter, \(\hat{\beta}\), is 1.11 and the variability parameter, \(\hat{\sigma}\), is 3286.

Let’s look at a few track segments with interpolated values.

plot(crwOut, animals = "T172062-3", ask = FALSE)

plot(crwOut, animals = "T172064-5", ask = FALSE)

plot(crwOut, animals = "T172066-4", ask = FALSE)

Notice how the predicted tracks do not make perfectly straight lines through missing sections, which is an improvement on what a simple linear interpolation method would provide.

We can now extract the predicted regular tracks.

# Get predicted tracks from crawl output
track_int <- crwOut$crwPredict[which(crwOut$crwPredict$locType == "p"),
                              c( "ID", "mu.x", "mu.y", "time")]
colnames(track_int) <- c( "ID","x", "y", "time")
head(track_int)
##           ID         x       y                time
## 1  T172062-1 -285113.8 8176359 2017-08-08 00:20:00
## 4  T172062-1 -285813.8 8176130 2017-08-08 00:50:00
## 8  T172062-1 -286046.5 8174867 2017-08-08 01:20:00
## 11 T172062-1 -286045.6 8174408 2017-08-08 01:50:00
## 14 T172062-1 -285394.1 8174699 2017-08-08 02:20:00
## 17 T172062-1 -284706.5 8173150 2017-08-08 02:50:00

Note here that the time is at every 30 min.

4.1.5 Place NAs in (small) gaps

Instead of interpolating, one can leave data gaps as NAs. Some analysis, for example hidden Markov models without spatial covariates extracted from the locations, could easily handle NAs in the locations. The maximum size of a gap to allow depends on the frequency of the missing data, frequency of locations, study species, and behaviours of interest. However, for many analyses that assume regular time interval, it’s important to explicitly include the NAs, so that the function knows that there are missing data they have to account for. Placing NAs for missing locations can be used on its own or in conjunction with splitting tracks. The package adehabitatLR has a function setNA dedicated to it.

We will apply this to the split tracks that we used previously used in the tutorial. Here, instead of using crawl to interpolate missing locations, we are simply creating a dataframe with the missing locations set to NA (i.e., creating a regular grid of observations with some NAs). The idea is that we still want to separate tracks into segment when there are large data gaps, but place NAs when it’s just a few locations missing.

The first step is to create an adhabitat trajectory. Here, we simply input coordinates, time, and the ID (here track segment id).

track_ltraj <- as.ltraj(xy = track_split[, c("x", "y")], 
                                   date = track_split$time, 
                                   id = track_split$ID)

Now we set the NAs. As before, we use a time interval of 30 min. We specify that we have a tolerance for the imprecision in the data, here at most 15 min. We use the first location at the time reference.

# Create adehabitat object, containing the trajectory padded with NAs
track_NA <- setNA(ltraj = track_ltraj, 
                  date.ref = track_split$time[1], 
                  dt = 30, tol = 15, units = "min")

# Transform back to dataframe
track_NA <- ld(track_NA)[, c("id", "x", "y", "date")]
colnames(track_NA) <- c("ID", "x", "y", "time")
head(track_NA)
##          ID        x       y                time
## 1 T172062-1 -80.6525 72.2728 2017-08-08 00:20:00
## 2 T172062-1 -80.6692 72.2674 2017-08-08 00:52:00
## 3 T172062-1 -80.6592 72.2563 2017-08-08 01:29:00
## 4 T172062-1 -80.6558 72.2550 2017-08-08 02:04:00
## 5 T172062-1 -80.6316 72.2582 2017-08-08 02:26:00
## 6 T172062-1 -80.5892 72.2403 2017-08-08 03:02:00

We can see now that there are NAs for some locations, and that the time is not exactly at every 30 min, but close to. So this may not be adequate for some analysis that require strict time regularity. One could round to the nearest 30min, but this could be problematic.

4.1.6 Multiple Imputation

For many analyses where you need to extract covariates from the location data, it may be wise to use multiple imputation, where you create multiple tracks with a CTCRW model and extract the covariates for each of the tracks. Then you can propagate the uncertainty in other analyses.

Multiple imputation works by first fitting a CTCRW model to the original data, second, drawing (i.e., simulating) a number of realisations of the position process based on the CTCRW model, third (not done here), fitting your model of interest to each of the simulated realisations, and finally, pooling the estimated parameters. momentuHMM has several functions to implement multiple imputation. The function MIfitHMM can be used both to simulate realisations of a fitted CTCRW and fit hidden Markov model (HMMs) to each one. The number of simulations is specified with nSims. We can simulate realisations without fitting HMMs by setting fit = FALSE.

Here, let’s use first fit a CTCRW model on segmented tracks created above. We will simulate 4 tracks using MIfitHMM.

set.seed(12)

# Fit the correlated random walk, MIfitHMM takes a crwData object
crw_gps_30 <- crawlWrap(obsData = track_split_sf, timeStep = "30 mins")
## Fitting 20 track(s) using crawl::crwMLE...
## Individual T172062-1...
## Individual T172062-2...
## Individual T172062-3...
## Individual T172062-4...
## Individual T172062-5...
## Individual T172062-6...
## Individual T172062-7...
## Individual T172064-1...
## Individual T172064-2...
## Individual T172064-3...
## Individual T172064-4...
## Individual T172064-5...
## Individual T172064-6...
## Individual T172066-1...
## Individual T172066-2...
## Individual T172066-3...
## Individual T172066-4...
## Individual T172066-5...
## Individual T172066-6...
## Individual T172066-7...
## DONE
## Predicting locations (and uncertainty) at 30 mins time steps for 16 track(s) using crawl::crwPredict...
## DONE
# simulate 4 realisations of the 30 min GPS CTCRW model
MI_sim_gps <- MIfitHMM(crw_gps_30, nSims = 4, fit = FALSE)
## Drawing 4 realizations from the position process using crawl...
## DONE
## DONE

This will return warning messages. They are not shown here, but one should look at the segments that resulted in warning messages. This can potentially be fixed by forcing longer track segments above.

Let’s look at one segment: T172066-6.

ID_seg <- "T172066-6"
# plot locations for first narwhal
# filter first ID from original data
track_eg <- track_split_sf %>% 
  mutate(x = st_coordinates(track_split_sf)[,"X"], 
         y = st_coordinates(track_split_sf)[,"Y"]) %>% 
  filter(ID == ID_seg)

# filter first ID for each simulation
sim_tracks <- lapply(MI_sim_gps$miData, function(x){
  filter(x, ID == ID_seg)})

# plot original track for first narwhal
plot(track_eg$x, track_eg$y, col = "red", xlab = "x", ylab = "y", asp = 1, type = "l")

# plot each simulated track
mute <- mapply(function(data, col) {
                 points(y ~ x, data = data, col = col, type = "l")
               }, data = sim_tracks, 
               col = list("cadetblue1", "cadetblue2", "cadetblue3", "cadetblue4"), 
               SIMPLIFY = FALSE)

Notice how in some areas the different simulations have generally good agreement in the likely location during gaps, while in others they diverge. Multiple imputation can be particularly powerful if we want to incorporate environmental variables, as spatially explicit variables can be extracted for each simulated track to sample the most likely conditions encountered by the animal.

5 Filtering error of Argos data

5.1 Gentoo movement data

We will analyze a dataset containing movement tracks of three gentoo penguins tagged with Argos tags. The dataset belongs to Dr. Marie Auger-Méthé (University of British Columbia) and Dr. Glenn Crossin (Dalhousie University). They provided the data only for this tutorial, please do not use for other purposes without their consent. Contact: .

Let’s load the data.

gentoo_tracks <- read.csv(
  here("D1-data-prep-ssm", "data", "tracks_argos_gentoo.csv")) %>%
  mutate(ID = factor(ID),
         datetime = ymd_hms(datetime))
  
head(gentoo_tracks)
##       ID            datetime Longitude  Latitude Qual
## 1 170902 2018-05-24 02:04:31 -59.19942 -51.41017    0
## 2 170902 2018-05-24 02:04:31 -59.19942 -51.41017    0
## 3 170902 2018-05-24 02:04:31 -59.19942 -51.41017    0
## 4 170902 2018-05-24 02:04:31 -59.19942 -51.41017    0
## 5 170902 2018-05-24 10:09:38 -59.23020 -51.41526    2
## 6 170902 2018-05-24 10:09:38 -59.23020 -51.41526    2

The columns/variables are: - ID: Individual identifier - datetime: time of location - Longitude: longitude - Latitude: latitude - Qual: Argos location quality, one of 0, 1, 2, 3, A, B

There is often duplicates and records (rows) with missing data. We want to remove any row that is missing latitude or longitude or location quality and duplicates.

gentoo_tracks <- gentoo_tracks %>% 
  filter(!is.na(Longitude) & !is.na(Latitude) & Qual != "",
         # remove identical records
         !(datetime == lag(datetime) & 
             Longitude == lag(Longitude) & 
             Latitude == lag(Latitude) & 
             Qual == lag(Qual)))

Here, the data are already sorted in order of ID and datetime, but this is not always the case. It’s a good habit to sort them appropriately.

gentoo_tracks <- gentoo_tracks %>% arrange(ID, datetime)

Let’s plot the data. Here, the coordinate system is WGS83 (i.e., regular lat/lon), so we use crs = 4326.

ggplot() +
  geom_spatial_path(data =  gentoo_tracks, 
                    aes(x = Longitude, y = Latitude, color = ID), 
                    crs = 4326)
## Warning: Ignoring transformation in stat_spatial_identity(). Use coord_sf()
## with a crs to project this layer.

Let’s look at the data gaps as above. First, let’s calculate the time difference dt.

# Calculate time difference between locations
gentoo_tracks <- gentoo_tracks %>%
  mutate(dt = ifelse(ID == lead(ID), # If next data row is same individual
                     # calculate time difference
                     difftime(lead(datetime), datetime, units = "mins"), 
                     NA))

Let’s look at the time differences between consecutive locations.

# Visualise time differences (all and zoomed)
par(mfrow = c(1, 2))
hist(gentoo_tracks$dt, 1000, main = NA, xlab = "Time difference (min)")
hist(gentoo_tracks$dt, 1000, main = NA, xlab = "Time difference (min)",
     xlim = c(0,2*60))

Here we look at the time series of time differences for each individual.

ggplot() +
  geom_line(data = gentoo_tracks, aes(x = datetime, y = dt/60)) +
  facet_wrap(~ID) +
  ylab("Time interval (hour)") +
  xlab("Date")

Let’s identify the most frequent time intervals.

# identify the most frequent dt
gentoo_tracks %>% 
  {table(round(.$dt))} %>% 
  sort(decreasing = TRUE) %>% 
  head(20)
## 
##   0   1   2   6 239   3   4  12  17   5  96  97  11 103 242 487 722   7  10  13 
##  37  25  22  18  15  14  12  11  11  10  10  10   9   9   8   8   8   7   7   7

We see that the most frequent time gap is \(0\) min, followed by\(1\), \(2\), and \(6\) min. But we also see a large variability, including time gaps of ~ 4 hours (240).

We can now use the function p_na (in the script utility_functions.R) to look at the proportion of NAs we would get with 2hr (120 min), 4 hr (240 min), and 8 hr (480), 12 hr (720) resolution.

dt_s <- c(2, 4, 8, 12) # hours
# summarise track dt
gentoo_tracks %>% 
  group_by(ID) %>% 
  summarise(p_NA_1 = p_na(datetime, dt_s[1]*60),    
            p_NA_2 = p_na(datetime, dt_s[2]*60),    
            p_NA_3 = p_na(datetime, dt_s[3]*60),  
            p_NA_4 = p_na(datetime, dt_s[4]*60)) %>% 
  # return formatted table
  kable(digits = 3, 
        col.names = c("Gentoo ID", 
                      paste(dt_s, "h"))) %>%
  kable_styling() %>% 
  add_header_above(c("", "Resolution" = 4))
Resolution
Gentoo ID 2 h 4 h 8 h 12 h
170902 0.705 0.530 0.194 0.082
170905 0.730 0.532 0.232 0.103
170914 0.731 0.546 0.295 0.138

The proportions of NAs estimated here is very approximate. Worth looking further into this as its common for Argos data to have many locations in a row (i.e., clumps).

We may also want to split first. From the time series, it looks like the biggest gaps are in the order of 20 hr or more. So we are going to split at 20 hr. The function split_at_gap require the column name ID and time. So we reformat before.

gentoo_tracks <- gentoo_tracks %>% rename(time = datetime)
# Use function from utility_function.R to split data at gaps
gentoo_split <- split_at_gap(data = gentoo_tracks, 
                           max_gap = 20*60, 
                           shortest_track = 20*60)

Let’s re-explore the missing data points.

# summarise track dt
gentoo_split %>% 
  group_by(ID) %>% 
  summarise(p_NA_1 = p_na(time, dt_s[1]*60),    
            p_NA_2 = p_na(time, dt_s[2]*60),    
            p_NA_3 = p_na(time, dt_s[3]*60),  
            p_NA_4 = p_na(time, dt_s[4]*60)) %>% 
  # return formatted table
  kable(digits = 3, 
        col.names = c("Gentoo ID", 
                      paste(dt_s, "h"))) %>%
  kable_styling() %>% 
  add_header_above(c("", "Resolution" = 4))
Resolution
Gentoo ID 2 h 4 h 8 h 12 h
170902-1 0.600 0.500 0.000 0.000
170902-2 0.704 0.529 0.151 0.029
170902-3 0.717 0.500 0.143 0.056
170902-5 0.669 0.485 0.198 0.052
170902-6 0.659 0.467 0.083 0.000
170902-7 0.636 0.435 0.154 0.000
170902-8 0.593 0.357 0.067 0.000
170905-1 0.559 0.389 0.111 0.000
170905-10 0.706 0.494 0.140 0.034
170905-2 0.697 0.505 0.176 0.059
170905-5 0.700 0.455 0.091 0.000
170905-6 0.690 0.467 0.250 0.000
170905-7 0.689 0.381 0.083 0.000
170905-8 0.677 0.447 0.167 0.059
170905-9 0.703 0.432 0.105 0.000
170914-1 0.700 0.429 0.250 0.000
170914-10 0.731 0.432 0.190 0.000
170914-11 0.650 0.400 0.167 0.000
170914-13 0.625 0.267 0.125 0.000
170914-15 0.729 0.556 0.158 0.077
170914-16 0.531 0.320 0.061 0.000
170914-17 0.717 0.458 0.231 0.000
170914-3 0.455 0.167 0.000 0.000
170914-4 0.654 0.462 0.133 0.000
170914-5 0.741 0.500 0.357 0.000
170914-6 0.583 0.429 0.250 0.000
170914-8 0.737 0.536 0.125 0.000
170914-9 0.688 0.375 0.250 0.000

The results change when splitting, but it’s still hard to find a perfect resolution. But something like 4 hours or a multiple of for 4 hours appear like a good solution.

5.2 Using a state-space model to filter out the error, but not regularise

One way around this issue of time resolution is to use a state-space model that predict the location at the same time as the time the data is observed. This process allows to account for the error of Argos data without having to regularize the data. If the subsequent analysis allow for irregular time intervals, this may be a good option.

There are many state-space models that could be used to model the Argos error, including the crawl model explored above for the narwhal data, but here we use the model from the R package aniMotum.

One of the main choice one must make is which model to fit. Here we can model the underlying movement with a simple random walk (model = rw), a correlated random walk (model = crw, more directed), and the time-varying move persistence model (model = mp). Here, we can see that the penguins have very different movement when close to the colony than when travelling to foraging. As such, the time-varying move persistence model, which allow the animal to change less to more directed appears the best option.

We also need to tell the function fit_smm which columns contain the id of the animal, coordinates, the date, and the location quality.

The function fit_ssm applies some prefiltering before applying the state-space model for data. To see the default values for things like maximum travel rate and min allowable time difference, look at the help file (?fit_ssm).

gentoo_mpi <- fit_ssm(gentoo_tracks,
                      id = "ID",
                      coord = c("Longitude", "Latitude"),
                      date = "time",
                      lc = "Qual",
                      model = "mp")
## fitting mp SSM to 3 tracks...
##  pars:   0.78793 1.45317 0 -0.20649 0 0 0       pars:   0.58693 1.10913 -0.06442 -0.21683 -0.68041 -0.61155 -0.00536       pars:   -0.01607 0.07702 -0.25768 -0.24785 -2.72165 -2.44619 -0.02144       pars:   1.15451 -0.62144 -1.76269 -0.47685 -0.02191 -4.40014 0.82588       pars:   0.12016 0.12577 -0.34236 -0.26059 -2.34668 -2.40817 0.0379       pars:   0.10272 0.21628 -0.6613 -0.3158 -2.13094 -2.39738 0.15584       pars:   -0.01744 0.16348 -0.99362 -0.40382 -2.29233 -2.45403 0.2516       pars:   0.10232 0.21039 -1.27296 -0.48171 -2.06454 -2.34889 0.35284       pars:   0.03201 0.06932 -1.49576 -0.6253 -2.30079 -2.45309 0.46268       pars:   0.10052 0.3739 -1.61469 -0.82414 -2.27997 -2.38691 0.59641       pars:   0.14314 0.16671 -1.75348 -0.95266 -2.06767 -2.56331 0.72726       pars:   -0.05183 0.11061 -1.84963 -1.04434 -2.18817 -2.27515 0.85945       pars:   0.1886 0.31714 -2.00288 -1.10365 -2.21214 -2.46379 0.96056       pars:   -0.09481 0.25872 -2.11849 -1.28867 -2.12676 -2.63587 1.03544       pars:   0.08361 0.24179 -2.06631 -1.12723 -2.22119 -2.48542 0.98443       pars:   0.08729 0.30211 -2.1481 -1.13715 -2.13752 -2.41613 0.99689       pars:   0.09021 0.20881 -2.24786 -1.17144 -2.18392 -2.41103 1.01628       pars:   0.0412 0.32168 -2.32098 -1.18689 -2.17151 -2.4409 1.04127       pars:   0.07656 0.27552 -2.42925 -1.21263 -2.12499 -2.46831 1.10223       pars:   0.09825 0.2833 -2.53925 -1.22897 -2.2019 -2.41179 1.12197       pars:   0.04101 0.26608 -2.62224 -1.30868 -2.15866 -2.45203 1.07619       pars:   0.11457 0.34924 -2.68725 -1.28006 -2.10994 -2.42122 1.11747       pars:   0.06397 0.29204 -2.64253 -1.29974 -2.14345 -2.44241 1.08908       pars:   0.06022 0.27825 -2.66999 -1.29675 -2.16405 -2.45986 1.11114       pars:   0.05865 0.29043 -2.71043 -1.25886 -2.13422 -2.44542 1.17755       pars:   0.06895 0.28918 -2.7808 -1.31665 -2.15121 -2.43979 1.17395       pars:   0.06895 0.28918 -2.7808 -1.31665 -2.15121 -2.43979 1.17395      
##  pars:   1.52139 1.9791 0 0.80904 0 0 0       pars:   1.22717 1.5513 -0.08327 0.86376 -0.63699 -0.56094 0.00926       pars:   0.34454 0.2679 -0.33307 1.02789 -2.54797 -2.24376 0.03705       pars:   -0.20967 -0.45248 -0.61837 1.14349 -2.70732 -2.85617 0.04683       pars:   0.95063 1.16564 -0.18626 0.91759 -1.03546 -1.00269 0.01649       pars:   0.36826 0.26103 -0.59304 1.0546 -1.38774 -1.95395 0.02235       pars:   -0.38394 0.50359 -1.36545 1.11435 -0.47491 -1.4643 -0.24769       pars:   -0.51669 0.43549 -1.7905 1.20048 -1.06486 -1.61772 -0.35142       pars:   -0.01875 0.22151 -1.94238 1.21716 -0.70944 -2.00338 -0.3276       pars:   -0.37169 0.35202 -1.8222 1.20217 -0.84305 -1.73104 -0.33508       pars:   -0.22751 0.38614 -1.98148 1.25256 -1.03488 -1.7818 -0.28966       pars:   -0.09649 0.47497 -2.19359 1.31861 -0.92257 -1.73806 -0.24104       pars:   -0.1161 0.47779 -2.36395 1.5281 -0.97105 -1.85708 -0.2796       pars:   -0.16764 0.39241 -2.46766 1.64813 -1.03234 -1.62815 -0.28467       pars:   -0.13836 0.48804 -2.36717 1.53196 -0.98449 -1.78459 -0.28143       pars:   -0.13752 0.41699 -2.38319 1.55353 -0.96956 -1.78476 -0.28996       pars:   -0.15527 0.45028 -2.38277 1.5529 -0.97392 -1.77615 -0.28993       pars:   -0.15527 0.45028 -2.38277 1.5529 -0.97392 -1.77615 -0.28993      
##  pars:   1.5194 2.01369 0 -1.63902 0 0 0       pars:   1.17407 1.55397 -0.07777 -1.73645 -0.59921 -0.54274 -0.01566       pars:   0.13807 0.17483 -0.31109 -2.02872 -2.39685 -2.17097 -0.06262       pars:   1.91168 -0.88118 -0.90157 -1.45951 0.33805 -3.91506 0.67559       pars:   0.71613 0.32201 -0.3612 -1.8531 -1.45622 -2.08779 0.08339       pars:   0.52107 -0.24337 -1.06108 -2.0901 -1.18801 -1.54091 0.24909       pars:   0.63474 0.08612 -0.65321 -1.95198 -1.34431 -1.85962 0.15252       pars:   0.67551 0.20428 -0.50693 -1.90245 -1.40037 -1.97392 0.11789       pars:   0.59217 0.25545 -0.54589 -1.88632 -1.367 -2.01213 0.13196       pars:   0.33848 0.32264 -0.74762 -1.87169 -1.23247 -2.04687 0.19425       pars:   -0.04116 0.32153 -1.14495 -1.88934 -0.99334 -2.00521 0.30978       pars:   -0.4679 0.04848 -1.40239 -1.95427 -1.59389 -2.6754 0.34333       pars:   -0.15947 0.24583 -1.21632 -1.90734 -1.15983 -2.19101 0.31908       pars:   -0.23328 0.21674 -1.42964 -1.89946 -1.06154 -2.35201 0.32381       pars:   -0.21829 0.26265 -1.72036 -1.91211 -1.04279 -2.3565 0.31433       pars:   -0.29254 0.408 -1.87497 -1.85051 -1.20041 -2.44585 0.29702       pars:   -0.28425 0.26038 -1.951 -1.65257 -1.1145 -2.44949 0.1819       pars:   -0.29715 0.32419 -2.00685 -1.66196 -1.1419 -2.16923 0.15494       pars:   -0.2965 0.29226 -1.98077 -1.66482 -1.12188 -2.35729 0.1778       pars:   -0.28893 0.28537 -2.07396 -1.69201 -1.10217 -2.38119 0.19475       pars:   -0.42606 0.45421 -2.26166 -1.82373 -1.04706 -2.42058 0.30284       pars:   -0.11068 0.02842 -2.09264 -1.80935 -1.12488 -2.45073 -0.04007       pars:   -0.29164 0.25749 -2.06799 -1.70557 -1.12744 -2.38858 0.1896       pars:   -0.28954 0.27907 -2.07261 -1.69507 -1.10788 -2.38286 0.19359       pars:   -0.28976 0.28424 -2.07756 -1.69791 -1.1118 -2.38644 0.19434       pars:   -0.29101 0.2795 -2.07475 -1.70485 -1.11329 -2.38681 0.19196       pars:   -0.28985 0.28204 -2.07768 -1.71251 -1.11436 -2.39007 0.19083       pars:   -0.28985 0.28204 -2.07768 -1.71251 -1.11436 -2.39007 0.19083
head(gentoo_mpi)
## # A tibble: 3 × 5
##   id     ssm           converged pdHess pmodel
##   <chr>  <named list>  <lgl>     <lgl>  <chr> 
## 1 170902 <mp_ssm [15]> TRUE      TRUE   mp    
## 2 170905 <mp_ssm [15]> TRUE      TRUE   mp    
## 3 170914 <mp_ssm [15]> TRUE      TRUE   mp

While we are fitting the model to all three gentoo penguins at the same, the current package version fit the model independently to each individual (i.e., parameters are not shared across the individual).

The function summary provides information on the convergence, the AICc, and the parameter estimates for each individual.

summary(gentoo_mpi)
##  Animal id Model Time n.obs n.filt n.fit n.pred n.rr converged   AICc
##     170902    mp    .   404     31   373      0    .      TRUE 4879.6
##     170905    mp    .   305     26   279      0    .      TRUE 4325.6
##     170914    mp    .   350     55   295      0    .      TRUE 4223.7
## 
## --------------
## 170902 
## --------------
##  Parameter Estimate Std.Err
##      rho_p  -0.5772  0.0753
##    sigma_x   1.0714  0.0827
##    sigma_y   1.3353   0.086
##      rho_o   0.5277  0.0795
##      tau_x   0.1163  0.0076
##      tau_y   0.0872  0.0066
##    sigma_g    0.062  0.0145
## 
## --------------
## 170905 
## --------------
##  Parameter Estimate Std.Err
##      rho_p   0.6507  0.0949
##    sigma_x   0.8562  0.1154
##    sigma_y   1.5687  0.1337
##      rho_o   -0.144  0.1157
##      tau_x   0.3776  0.0257
##      tau_y   0.1693  0.0126
##    sigma_g   0.0923  0.0248
## 
## --------------
## 170914 
## --------------
##  Parameter Estimate Std.Err
##      rho_p  -0.6943  0.0789
##    sigma_x   0.7484  0.0769
##    sigma_y   1.3258  0.1017
##      rho_o   0.0951  0.1299
##      tau_x   0.3281    0.02
##      tau_y   0.0916  0.0099
##    sigma_g   0.1252  0.0292

Here, the model converged for all tracks.

The package also provides easy plotting functions. When we use type = 1, we can see the latitude and longitude separately. When we use type = 2 we see the tracks.

plot(gentoo_mpi, what ="fitted", type=1, ask = FALSE)
## $`170902`

## 
## $`170905`

## 
## $`170914`

plot(gentoo_mpi, what ="fitted", type=2, ask = FALSE)
## $`170902`

## 
## $`170905`

## 
## $`170914`

In both sets of plots, the blue circles are the observations used in the state-space model, the x are the observations that were removed during the prefiltering, and the orange circle and line is the fitted track returned by the model. In the type 1 plots, the rug display the observation timing.

We can use the function map to create a nicer map of the movement data. Here are the individuals are plotted in the same figure, there is a base map of land masses, and the movement persistence is represented by the colour gradient, with yellow representing more persistent movement.

aniMotum::map(gentoo_mpi, what = "fitted")
## using map scale: 10

By default, the map function normalise the move persistence value for each individual independently so that each track range between 0 and 1. To apply the normalisation so that the magnitudes of move persistence are preserved across individuals you can do the following:

aniMotum::map(gentoo_mpi, what = "fitted", normalise = TRUE, group = TRUE)
## using map scale: 10

To be able to analyse the data further or to be able to plot it yourself, you can graph the data and make it into a tibble using the function grab.

gentoo_mpi_t <- grab(gentoo_mpi)
head(gentoo_mpi_t)
## # A tibble: 6 × 11
##   id     date                  lon   lat      x      y  x.se  y.se logit_g
##   <chr>  <dttm>              <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>   <dbl>
## 1 170902 2018-05-24 10:09:38 -59.2 -51.4 -6593. -6662. 0.179 0.112   0.652
## 2 170902 2018-05-24 10:11:07 -59.2 -51.4 -6593. -6662. 0.183 0.123   0.652
## 3 170902 2018-05-25 02:12:32 -59.2 -51.4 -6594. -6661. 1.49  1.26    0.652
## 4 170902 2018-05-25 10:14:34 -59.2 -51.4 -6593. -6663. 1.64  1.31    0.652
## 5 170902 2018-05-25 11:31:21 -59.2 -51.4 -6592. -6663. 1.57  1.23    0.652
## 6 170902 2018-05-25 11:32:20 -59.2 -51.4 -6592. -6663. 1.56  1.22    0.652
## # ℹ 2 more variables: logit_g.se <dbl>, g <dbl>

We can see here that we now have different names for some of the columns like id, date. These are the names used by the function fit_ssm. We also now have the new location provided by the state-space model in lon and lat and move persistence g. We also get the coordinates and their standard error and the logit gamma with its standard error.

We can now plot the data ourselves.

ggplot() +
  geom_spatial_path(data =  gentoo_mpi_t, 
                    aes(x = lon, y = lat, lty = id), 
                    crs = 4326) +
  geom_spatial_point(data =  gentoo_mpi_t, 
                    aes(x = lon, y = lat, col = g), 
                    crs = 4326) +
  coord_sf(crs = 4326)

It is important to check whether the model is appropriate for the data. To do so, we can look at one-step ahead residuals. This will take a few minute to run.

res.crw <- osar(gentoo_mpi)
plot(res.crw, type = "ts")
## Warning: Removed 12 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 25 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 25 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Warning: Removed 31 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 31 rows containing missing values or values outside the scale range
## (`geom_point()`).

plot(res.crw, type = "qq")

plot(res.crw, type = "acf")

We can see here that the model is not perfect. There is still a bit of autocorrelation and the model is not great for very small and large values. But overall it’s looking not too bad.

5.3 Using a state-space model to account for Argos error and regularise

For some downstream analyses, we may need regularise tracks. As we saw above, it can be hard to find the perfect time interval, and it will be dependent on the question and analysis you’ll do afterward. Here we will use 4 hour. We chose this interval because the foraging trips of the penguins are only a few hours and going coarser lose all the information from the trips.

Again, we can use the functionfit_smm from the package aniMotum. The only difference is that now we will include a time step of 4 hour.

gentoo_mpr <- fit_ssm(gentoo_tracks,
                      id = "ID",
                      coord = c("Longitude", "Latitude"),
                      date = "time",
                      lc = "Qual",
                      model = "mp",
                      time.step = 4)
## fitting mp SSM to 3 tracks...
##  pars:   0.40087 0.90958 0 -0.16411 0 0 0       pars:   0.16757 0.60619 -0.01949 -0.17263 -0.68001 -0.62501 -0.00382       pars:   -0.53232 -0.30398 -0.07797 -0.19822 -2.72005 -2.50005 -0.0153       pars:   1.57723 -0.66927 -1.29194 -0.57469 -0.30255 -4.3601 0.68957       pars:   -0.25577 -0.19857 -0.16856 -0.22521 -2.21913 -2.37347 0.04405       pars:   -0.2106 0.1158 -0.59361 -0.40267 -2.07133 -2.3623 0.22294       pars:   -0.26816 -0.19415 -0.97026 -0.55526 -2.24183 -2.58051 0.38352       pars:   -0.19142 -0.11238 -1.31911 -0.72373 -2.23367 -2.16667 0.56522       pars:   -0.0992 -0.0123 -1.58245 -0.98476 -1.97952 -2.48938 0.77077       pars:   -0.31422 0.08496 -1.64778 -1.30919 -2.34792 -2.55601 1.01902       pars:   -0.22915 0.04204 -1.61527 -1.01851 -2.22129 -2.42531 0.79551       pars:   -0.13524 -0.08429 -1.77849 -1.15012 -2.2038 -2.40268 0.9207       pars:   -0.24046 0.02087 -1.94913 -1.24651 -2.10326 -2.36194 1.03437       pars:   -0.09537 0.09418 -2.06941 -1.33667 -2.24478 -2.46862 1.10364       pars:   -0.16428 -0.02157 -2.29781 -1.39061 -2.15249 -2.41976 1.07044       pars:   -0.1136 0.08412 -2.48951 -1.27676 -2.17008 -2.27506 1.09351       pars:   -0.17868 0.0497 -2.30576 -1.38781 -2.16752 -2.38156 1.07677       pars:   -0.17638 0.03668 -2.33168 -1.35293 -2.15751 -2.407 1.14214       pars:   -0.1345 0.0482 -2.38572 -1.3897 -2.18057 -2.38785 1.14719       pars:   -0.1345 0.0482 -2.38572 -1.3897 -2.18057 -2.38785 1.14719      
##  pars:   1.20227 1.29534 0 0.54398 0 0 0       pars:   0.88004 0.94471 -0.04769 0.57712 -0.61843 -0.6223 0.01144       pars:   -0.08665 -0.10719 -0.19076 0.67652 -2.47374 -2.48921 0.04575       pars:   -0.66062 -0.66284 -0.4938 0.75557 -2.68064 -3.18294 0.05296       pars:   0.62673 0.6804 -0.12104 0.60646 -0.9575 -1.04332 0.01827       pars:   0.03179 0.07713 -0.30607 0.66866 -1.42329 -1.91346 0.00974       pars:   -0.33317 -0.25636 -0.63901 0.76638 -0.27233 -1.79163 -0.15092       pars:   -0.00372 0.0595 -0.39823 0.68998 -0.91306 -1.72503 -0.0414       pars:   -0.37974 -0.03489 -0.68552 0.76677 -1.13845 -1.59338 -0.08731       pars:   -0.55458 -0.07853 -0.95524 0.83907 -0.81753 -1.90246 -0.09024       pars:   -0.59917 -0.00019 -1.02091 0.86019 -0.97571 -1.70411 -0.09071       pars:   -0.61701 -0.03667 -1.23845 0.94275 -0.94998 -1.84796 -0.10077       pars:   -0.53371 0.00134 -1.46774 1.01794 -0.92589 -1.74785 -0.10629       pars:   -0.66206 0.15087 -1.43397 1.12612 -1.07699 -1.79065 -0.13635       pars:   -0.58388 0.01823 -1.45495 1.03499 -0.98839 -1.78018 -0.11035       pars:   -0.61851 0.04175 -1.43396 1.06069 -0.91564 -1.78093 -0.1191       pars:   -0.66335 0.04129 -1.4059 1.11698 -0.96253 -1.77789 -0.12613       pars:   -0.65953 -0.00522 -1.37262 1.17872 -0.93047 -1.78023 -0.13578       pars:   -0.67874 0.02151 -1.32731 1.24406 -0.93579 -1.75374 -0.14524       pars:   -0.67346 0.01447 -1.36248 1.1878 -0.94533 -1.77001 -0.13745       pars:   -0.66015 0.01046 -1.37032 1.19059 -0.94209 -1.77068 -0.13791       pars:   -0.65692 0.01591 -1.36621 1.20475 -0.94461 -1.77242 -0.14051       pars:   -0.64723 0.03225 -1.35386 1.24723 -0.95217 -1.77761 -0.14828       pars:   -0.61784 0.01308 -1.3168 1.36153 -0.92058 -1.76319 -0.17355       pars:   -0.62173 -0.01009 -1.33953 1.47059 -0.9655 -1.76355 -0.2239       pars:   -0.62615 0.00885 -1.32197 1.3991 -0.94405 -1.7651 -0.19043       pars:   -0.61769 0.01648 -1.35381 1.42544 -0.95948 -1.77225 -0.20587       pars:   -0.61771 0.00902 -1.33397 1.40523 -0.94601 -1.76697 -0.19387       pars:   -0.62132 0.00774 -1.32564 1.39934 -0.94299 -1.76532 -0.19046       pars:   -0.62421 0.01175 -1.32755 1.40197 -0.94268 -1.76703 -0.19158       pars:   -0.62421 0.01175 -1.32755 1.40197 -0.94268 -1.76703 -0.19158      
##  pars:   1.09256 1.30817 0 -0.88384 0 0 0       pars:   0.75186 0.91293 0.0042 -0.94729 -0.60468 -0.59829 -0.00862       pars:   -0.27025 -0.27279 0.01679 -1.13767 -2.41873 -2.39315 -0.03447       pars:   -0.71807 -0.82313 -0.35721 -1.2567 -2.71637 -3.07575 -0.05359       pars:   0.37558 0.46852 -0.08832 -1.0265 -1.14524 -1.23248 -0.02013       pars:   -0.38375 -0.59392 -0.53316 -1.21532 -1.63701 -2.65131 -0.01931       pars:   -1.48642 0.57628 -1.1268 -1.17441 -0.5617 -2.41854 0.19049       pars:   -0.46422 -0.30058 -1.91793 -0.77689 -1.07542 -1.27738 0.36357       pars:   -0.72339 -0.41712 -1.91055 -0.89292 -1.3852 -2.20338 0.33642       pars:   -0.18872 -0.05277 -1.667 -1.10001 -0.79666 -2.50307 0.03405       pars:   -0.6518 -0.13304 -1.86843 -0.92821 -1.03044 -2.12842 0.28727       pars:   -0.55605 -0.01412 -1.7855 -1.20343 -1.25784 -2.29751 0.09622       pars:   -0.53266 -0.19557 -1.58826 -1.43824 -1.05192 -2.49421 -0.02265       pars:   -0.56494 -0.06687 -1.7698 -1.23049 -1.10794 -2.34404 0.09112       pars:   -0.65982 -0.07316 -1.70021 -1.33206 -1.16261 -2.32542 0.06002       pars:   -0.65377 -0.08033 -1.61164 -1.45899 -1.096 -2.32123 0.06185       pars:   -0.60142 -0.07461 -1.51849 -1.56615 -1.14962 -2.37127 0.07569       pars:   -0.61876 0.00453 -1.48234 -1.64618 -1.14031 -2.2831 0.15548       pars:   -0.61011 -0.03614 -1.5106 -1.57594 -1.11969 -2.32667 0.07703       pars:   -0.62768 -0.06298 -1.49189 -1.62292 -1.13461 -2.32589 0.10515       pars:   -0.60665 -0.07004 -1.47575 -1.67059 -1.09678 -2.32845 0.11686       pars:   -0.62768 -0.06298 -1.49189 -1.62292 -1.13461 -2.32589 0.10515
head(gentoo_mpr)
## # A tibble: 3 × 5
##   id     ssm           converged pdHess pmodel
##   <chr>  <named list>  <lgl>     <lgl>  <chr> 
## 1 170902 <mp_ssm [15]> TRUE      TRUE   mp    
## 2 170905 <mp_ssm [15]> TRUE      TRUE   mp    
## 3 170914 <mp_ssm [15]> TRUE      TRUE   mp

As we can see they all converged and none had issue with the Hessian (pdHess = TRUE).

We can now do the same processes as above, but this time we look at the predicted not the fitted locations. As when we regularize we predict locations at times where we don’t necessarily have observations.

plot(gentoo_mpr, what ="predicted", type=1, ask=FALSE)
## $`170902`

## 
## $`170905`

## 
## $`170914`

plot(gentoo_mpr, what ="predicted", type=2, ask=FALSE)
## $`170902`

## 
## $`170905`

## 
## $`170914`

Map the tracks.

aniMotum::map(gentoo_mpr, what = "predicted", normalise = TRUE, group = TRUE)
## using map scale: 10

Grab the data for other analyses. If you don’t use the argument what = "predicted", grab will return the fitted data at irregular time intervals.

gentoo_mpr_t <- grab(gentoo_mpr, what = "predicted")
head(gentoo_mpr_t)
## # A tibble: 6 × 11
##   id     date                  lon   lat      x      y  x.se  y.se logit_g
##   <chr>  <dttm>              <dbl> <dbl>  <dbl>  <dbl> <dbl> <dbl>   <dbl>
## 1 170902 2018-05-24 10:00:00 -59.2 -51.4 -6593. -6662. 0.320 0.342    2.34
## 2 170902 2018-05-24 14:00:00 -59.2 -51.4 -6594. -6661. 3.21  3.83     2.34
## 3 170902 2018-05-24 18:00:00 -59.2 -51.4 -6594. -6661. 4.17  4.93     2.34
## 4 170902 2018-05-24 22:00:00 -59.2 -51.4 -6594. -6661. 3.30  3.80     2.34
## 5 170902 2018-05-25 02:00:00 -59.2 -51.4 -6594. -6661. 1.49  1.36     2.34
## 6 170902 2018-05-25 06:00:00 -59.2 -51.4 -6594. -6662. 2.14  2.32     2.34
## # ℹ 2 more variables: logit_g.se <dbl>, g <dbl>

Notice that the locations are now at every 4 hours.

5.4 Multiple simulated tracks for multiple imputation

To be able to do multiple imputation, we could simulate from the posterior. Here we do 4, just to be able to vizualise it. This process is similar to the process we did for the narwhal data above.

According to the vignette of aniMotum they state: “Posterior simulations from an SSM fit provide a useful characterization of track uncertainty. The sim_post function provides an efficient method for simulating tracks from the joint uncertainty of the estimated locations.”

set.seed(2025)
psim <- sim_post(gentoo_mpr, what = "predicted", reps = 4)
# Grab the data to be able to use it
gentoo_post <- psim %>% 
  unnest(cols = psims) %>% 
  mutate(rep = factor(rep))

Plots the tracks created by the simulations.

ggplot() +
  geom_spatial_path(data = gentoo_post, 
                    aes(x = lon, y = lat, colour = rep),
                    lwd = 0.3,
                    crs = 4326) +
  geom_spatial_point(data = gentoo_post, 
                    aes(x = lon, y = lat, pch = id, colour = rep),
                    crs = 4326) +
  coord_sf(crs = 4326)

We can see the slightly different paths. Replicate of 0 is the original track produce by the state-space model.

6 Literature cited

Johnson, DS, JM London, MA Lea, JW Durban. 2008. Continuous-time correlated random walk model for animal telemetry data. Ecology 89: 1208-1215